library(haven)
library(dplyr)
library(countrycode)
library(here)
Combining WEO and Penn World Table Data in R
Introduction
In this tutorial, we will go through the steps to combine two significant economic datasets: the World Economic Outlook (WEO) database and the Penn World Table (PWT) dataset. Both datasets offer a wealth of information on various economic indicators but use different systems for country codes. We will harmonize these codes to merge the datasets for selected variables and countries.
Prerequisites
Before we begin, ensure that you have R installed and the following libraries ready:
haven
: for reading and writing Stata files.dplyr
: for data manipulation.countrycode
: for converting country codes.
Load the libraries by running the following commands:
Reading the Data
Let’s start by reading the WEO and PWT datasets. The data is assumed to be in Stata format (.dta
files) and located within a databases
folder in your working directory.
<- read_dta(here("databases/WEOApr2023Pub.dta"))
weo <- read_dta(here("databases/pwt100.dta")) penn
Selecting and Filtering Variables
Due to the large size of the databases, we will focus on a few key variables and countries. Specifically, we will look at economic data from the Netherlands, USA, and Japan.
From the WEO dataset, we select the country name, IMF country code, year, and real GDP (ngdp_r
).
<- weo %>%
weo select(country, ifscode, year, ngdp_r) %>%
filter(ifscode %in% c(111, 138, 158))
From the PWT dataset, we select the country code, year, and employment (emp
).
<- penn %>%
penn select(countrycode, year, emp) %>%
filter(countrycode %in% c("USA", "NLD", "JPN"))
Standardizing Country Codes
To merge the datasets, we need to standardize the country codes. The WEO uses numeric ifscode
, while PWT uses three-letter countrycode
. We will convert both to the ISO three-letter codes.
# For Penn, directly assign the country codes to a new column
$iso3 <- penn$countrycode
penn
# For WEO, convert IMF numeric codes to ISO three-letter codes
$iso3 <- countrycode(sourcevar = weo$ifscode, origin = "imf", destination = "iso3c") weo
In the coming future, the WEO will change to use ISO3 letter codes and you will not need to perform this conversion any more. This would, however, remain useful when merging to databases like Haver, that uses IFS codes.
Merging the Data
Now that we have a common country code system, we can merge the datasets based on these codes and the year.
<- merge(weo, penn, by = c("iso3", "year")) combined_data
Use tail()
to take a peek at the last few rows of your merged dataset:
tail(combined_data)[1:7]
iso3 year country ifscode ngdp_r countrycode emp
205 USA 2014 United States 111 16932.03 USA 148.2206
206 USA 2015 United States 111 17390.30 USA 150.2485
207 USA 2016 United States 111 17680.30 USA 152.3970
208 USA 2017 United States 111 18076.65 USA 154.6723
209 USA 2018 United States 111 18609.05 USA 156.6759
210 USA 2019 United States 111 19036.05 USA 158.2996
Conclusion
You now have a single dataset that combines key economic indicators from the WEO and PWT databases, standardized to use consistent country codes. This data is ready for analysis, allowing you to investigate trends and perform comparisons across the selected countries.